ILP Methods for Family Trio Phasing

نویسندگان

  • D. Brinza
  • J. He
  • W. Mao
  • K. Westbrooks
  • M. Fraser
  • R. Harrison
  • A. Zelikovsky
چکیده

In population genotyping, it is common to genotype family trios consisting of the two parents and their child since that allows to recover haplotypes with higher confidence. Interestingly, the available software tools are primarily intended to phase only unrelated genotypes. In this section we first formulate the problem and describe specificity of family trio phasing and then analyze existing computational tools and discuss the pure parsimony objective. In the following section we give three integer linear program formulations and compare their runtime for the Daly et al [18] data. The haplotypes of children is much harder to recover than haplotypes of parents since we are not aware of recombinations which may happened when parents haplotypes are inherited by a child. Therefore, for simplicity, we assume no recombinations in child chromosomes and that exactly one child chromosome is inherited from one parent and another from the other parent. Formally, given a set of genotypes partitioned into family trios, the Trio Phasing Problem (TPP) requires to find for each trio a quartet of parent haplotypes which agree with all three genotypes. A simple logical analysis allows to substantially decrease uncertainty of phasing. For example, for two SNP’s in a trio with parent genotypes f = 22 and m = 02, and the child genotype k = 01, there is a unique feasible phasing of the parents: f1 = 10, f2 = 01, m1 = 01, m2 = 00 such that the haplotypes f2 and m1 are inherited by the child. In fact, it is not difficult to check that logical ambiguity exists only if all three genotypes have 2’s in the same SNP site. Another source of ambiguity is in missing data – certain SNP’s for certain individuals may be not available due to failures during genotyping. Although in the most recent data the missing data rate decreases, still they constitute a substantial part of the entire data (as large as 16% of the genotype data in Daly et al [5] data and 10% in Gabriel et al [19]). As mentioned above, one of the goals of these study has been to design and verify discrimination algorithms for the data [5] which is one of infrequent publicly available large-scale case/control genotype data. We have tried several well-known computational methods for phasing this data trying to find feasible solution for the TPP since this data are given in family trios. Surprisingly, all the methods which we have tried give infeasible solutions with high inconsistency rate. The error rate has been measured as the ratio of the number of inconsistently phased SNP’s over the total number of ambiguous SNP which are either missed or cannot be logically inferred. Note that the error rate does not rely on the assumption that no recombinations happen in the children. The Phamily tool based on well-known phasing tool PHASE is intended to phase the trio families [2]. It first uses the logical method described above to infer the SNP’s in the parental haplotypes. Then children genotypes are discarded while the parental genotypes and known haplotypes are passed

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phasing and Missing Data Recovery in Family Trios

Although there exist many phasing methods for unrelated adults or pedigrees, phasing and missing data recovery for data representing family trios is lagging behind. This paper is an attempt to fill this gap by considering the following problem. Given a set of genotypes partitioned into family trios, find for each trio a quartet of parent haplotypes which agree with all three genotypes and recov...

متن کامل

Family trio phasing and missing data recovery

Although there exist many phasing methods for unrelated adults or pedigrees, phasing and missing data recovery for data representing family trios is lagging behind. This paper is an attempt to fill this gap by considering the following problem. Given a set of genotypes partitioned into family trios, find for each trio a quartet of parent/offspring haplotypes explaining each trio without recombi...

متن کامل

A New ILP Model for Identical Parallel-Machine Scheduling with Family Setup Times Minimizing the Total Weighted Flow Time by a Genetic Algorithm

This paper presents a novel, integer-linear programming (ILP) model for an identical parallel-machine scheduling problem with family setup times that minimizes the total weighted flow time (TWFT). Some researchers have addressed parallel-machine scheduling problems in the literature over the last three decades. However, the existing studies have been limited to the research of independent jobs,...

متن کامل

Embryo genome profiling by single-cell sequencing for preimplantation genetic diagnosis in a β-thalassemia family.

BACKGROUND The embryonic genome, including genotypes and haplotypes, contains all the information for preimplantation genetic diagnosis, representing great potential for mendelian disorder carriers to conceive healthy babies. METHODS We developed a strategy to obtain the full embryonic genome for a β-thalassemia-carrier couple to have a healthy second baby. We carried out sequencing for singl...

متن کامل

Read-based phasing of related individuals

MOTIVATION Read-based phasing deduces the haplotypes of an individual from sequencing reads that cover multiple variants, while genetic phasing takes only genotypes as input and applies the rules of Mendelian inheritance to infer haplotypes within a pedigree of individuals. Combining both into an approach that uses these two independent sources of information-reads and pedigree-has the potentia...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006